class: center, middle, inverse, title-slide # FIN7030: Times Series Financial Econometrics 3 ## Statistical inference ### Barry Quinn PhD CStat ### 2022-02-06 --- layout: true <div class="my-footer"> <span> <a href="https://quinference.com" target="_blank"><b>quinference.com</b></a> - Dr. Barry Quinn </span> </div> --- class:inverse, middle # Learning Outcomes .large[ - What is statistical inference? - Challenges when running regressions - Building and check models - Interpretting models - Bayesian and classical inference in practice - Concluding and questions ] --- class: middle # Three challenges of statistics -.acid[Generalising from sample to population] -.acid[Generalising from treatment to control group] -.acid[Generalising from observed measurement to the underlying construct of interest] .blockquote[All three challenges can be framed as problems of prediction <footer>Vehtari,Gelman and Hill, 2021</footer>(hereafter ROS)] --- class: middle # Weapon of choice in social science .glow[regression] .blockquote[ Regression is a method that allows researchers to summarisze how predictions or average values of an *outcome* vary across individuals defined by a set of predictors] --- class: middle # Regression uses .acidline[ - Prediction: *Predicting victory or defeat in a sport contest* - Exploring association: *Summarising how well one variable, or set of variables, predicts outcomes, for example risk factor modelling in asset pricing* - Extrapolation: *Adjusting for known differences between the sample (observed data) and a population of interest, For example adjusting for Big Data online survey data for response bias* - Causal inference: *The most important use: estimating treatment effects by comparing outcomes under treatment and control* ] --- class: middle # Weapon of choice in social science .blockquote.large[A key challenge for causal inference is ensuring that treatment and control groups are similar, on average, before exposure to the treatment, or else adjusting for differences between groups- ROS] --- class: middle ## Challenges in building, understanding, and interpreting regressions .panelset[ .panel[ .panel-name[Hypothetical example of regression for causal inference] - Start with a simple scenario comparing treatment and control groups. - This condition can be approximated by *randomisation*, a design in which experimental units (in finance we can think of these a firms) are randomly assigned to treatment or control. - Consider the following hypothetical example where `\(x\)` is a random market shock (the treatment) affecting only certain firms in the UK market (x=0 for control or x=1 for treatment) ] .panel[.panel-name[Fake data + linear regression with binary predictor] .pull-left[ ```r N <- 50 x <- runif(N, 1, 5) y <- rnorm(N, 10 + 3*x, 3) x_binary <- ifelse(x<3, 0, 1) data <- data.frame(N, x, y, x_binary) lm_1a<-lm(y~x_binary,data) display(lm_1a) ``` ``` ## lm(formula = y ~ x_binary, data = data) ## coef.est coef.se ## (Intercept) 15.01 0.73 ## x_binary 5.89 1.03 ## --- ## n = 50, k = 2 ## residual sd = 3.63, R-Squared = 0.41 ``` ] .pull-right[ - If we can assume comparability of the groups assigned to different treatments, a regression predicting the outcome given treatment gives us a direct estimate of the causal effect. - .acidline[We will come back to the important set of assumptions charges this statistically engineered robot with causal inference power] - The results opposite show that the treatement as a positive and significant effect on our outcome measure. ] ] .panel[.panel-name[Visualising data + model] ] ] ] --- class: middle ## linear regression with continous predictor .panelset[ .panel[.panel-name[linear regression output] ```r lm_1b <- lm(y ~ x, data = data) display(lm_1b) ``` ``` ## lm(formula = y ~ x, data = data) ## coef.est coef.se ## (Intercept) 8.87 1.37 ## x 3.16 0.45 ## --- ## n = 50, k = 2 ## residual sd = 3.30, R-Squared = 0.51 ``` ] .panel[.panel-name[Visualise data and model] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/unnamed-chunk-5-1.png" width="60%" /> ] ] --- class: middle ## Non-linear predictor .panelset[ .panel[.panel-name[Fake data + regression] ```r y <- rnorm(N, 5 + 30*exp(-x), 2) data$y <- y lm_2a <- lm(y ~ x, data = data) display(lm_2a) ``` ``` ## lm(formula = y ~ x, data = data) ## coef.est coef.se ## (Intercept) 13.92 0.86 ## x -2.11 0.28 ## --- ## n = 50, k = 2 ## residual sd = 2.08, R-Squared = 0.54 ``` ] .panel[ .panel-name[Visual data + *true* non-linear effect] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/unnamed-chunk-7-1.png" width="60%" /> ] .panel[.panel-name[Visual data + linear effect model] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/unnamed-chunk-8-1.png" width="60%" /> ] ] --- class: middle ## Hypothetical causal adjustment example .panelset[ .panel[ .panel-name[Fake data with imbalance in groups] ```r display(lm_2) ``` ``` ## lm(formula = yy ~ xx + z, data = data) ## coef.est coef.se ## (Intercept) 20.47 0.60 ## xx 4.90 0.33 ## z 9.50 0.72 ## --- ## n = 100, k = 3 ## residual sd = 3.49, R-Squared = 0.77 ``` - This hypothetical example can be summarised as follows: - .blockquote[On average, the treated units were 5.02 points higher than the control, `\(\bar{y}\)`=32.6521878 for the treated and `\(\bar{y}\)`=25.5195435 for the controls. But the two groups differed in their pre-treatment predictor: `\(\bar{x}\)`=0.5462456 for treated and `\(\bar{x}\)`=0.5462456 for controls. After adjusting for this difference, we obtained an estimated treatment effect of 10.0] ] .panel[ .panel-name[Visualise data + model] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/unnamed-chunk-12-1.png" width="60%" /> ] ] --- class: middle ## Building interpreting and checking regression models * Model building, starting with simple linear models of the form, `\(y=a+bx+error\)` and expanding through additional predictors, interactions, and transformations. * Model fitting, which includes data manipulation, programming, and the use of algorithms to estimate regression coefficients and their uncertainties and to make probabilistic predictions. * Understanding model fits, which involves graphics, more programming, and an active investigation of the (imperfect) connections between measurements, parameters, and the underlying objects of study. • Criticism, which is not just about finding flaws and identifying questionable assumptions, but is also about considering directions for improvement of models. Or, if nothing else, limiting the claims that might be made by a naive reading of a fitted model. The next step is to return to the model-building step, possibly incorporating new data in this effort. --- class: middle ## Classical and Bayesian inference - As open science econometricians we mostly fit models to data and uses model to predict. - There are three concerns common to all all stesp in this framework 1. What **information** is used in the estimation process 2. What **assumptions** are made 3. How estimates and predictions are **interpreted**, in a Classical or Bayesian framework --- class: middle ## Information - In regressions we usually have data on an outcome variable and one of more predictors. - As we seen previous if we have one `\(x\)` predictor or one binary and one continuous predictor we have visual their relationship with the outcome `\(y\)` variable - In finance we will also have information on what data was observed - Is the data measured on a regular frequency? - Is the data free from survivorship bias? - Is the data a random or convenience sample? --- class: middle ### Prior information *learning from experience* - We may also have *prior knowledge* comes from sources other than the data, based on experience with previous or similar studies. - This information should be handled with care, as published research tends to overestimate effect sizes - This is due to researchers being under pressure to find large and *statistically significant* results - There are setting when local data is weak and it would be foolish to draw conclusions without using prior knowledge --- class: middle ## Assumptions 1. The function form of the regression model; typically **linearity**. 2. Where does the data come from: which potential observations are seen and which are not. A strong assumption here would be that there has been random sampling or random treatment assignment. In finance random sampling is raw. 3. The real-world relevance of the measured data; for example are today's measurements predictive of of what happens tomorrow?. - .acidinline[In time series financial econometrics we assess this statistically by comparing the stability of observations conducted in different ways or at different times.] --- class: middle # Classical inference - Based on summarising the information in the data alone, not using prior information. - Getting estimates and predictions that have well-known statistical properties, low bias and low variance. - This attitude is sometimes called *Frequentist*, in that classical statisticians focus on the long-run expectation of their methods. - Estimates should be correct on average; **unbiasedness**. - Confidence intervals should cover the true parameter value 95% of the time - An important principle of classical estimates is *conversatism*. - In classical statistics there should be a clear and *objective* path from data to inference, which in turn should be checkable, at least in theory, based on their frequency properties. --- class: middle # Bayesian inference - Goes beyond summarising data to produce statistical inferences that include prior information. - This information could be awareness of bias, selection on unmeasured characteristics, prior information of effect sizes. - One strength of Bayesian inference, the analysis can provide more reasonable inferences and can be used to make direct predictions about future outcomes. - One weakness is the need for an additional information; the **prior distribution**, which can be contentious in that it makes some claims about the range of the prediction effects. --- class: middle ## The choice - .large[Classical inference, leading to pure summaries of data which can have limited value as predictions] - .large[Bayesian inference, which in theory can yield valid predictions even with weak data, but relies on additional assumptions] -.acidinline.large[A modern financial data scientist, knows there is no universally correct choice, but should be aware of both and use them pragmatically.] - .blockquote[ - A practical advantage of Bayesian inference is that all inferences are probablistics and thus can be represented by random simulations. - For this reason when you want to summarise uncertainty in estimation beyond confidence intervals, and when we want to use regression models for prediction, **we go Bayesian** ] --- class: middle ## Computing least squares and Bayesian regressions .panelset[ .panel[.panel-name[Beauty and teaching evaluations] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/visual data-1.png" width="60%" /> ] .panel[.panel-name[Frequentist inferennce] Does beauty predict student evaluations? ```r display(lm(eval~beauty,data=rosdata::beauty)) ``` ``` ## lm(formula = eval ~ beauty, data = rosdata::beauty) ## coef.est coef.se ## (Intercept) 4.01 0.03 ## beauty 0.13 0.03 ## --- ## n = 463, k = 2 ## residual sd = 0.55, R-Squared = 0.04 ``` ] .panel[.panel-name[Bayesian inference] ``` ## ## Model Info: ## ## function: stan_glm ## family: gaussian [identity] ## formula: eval ~ beauty ## algorithm: optimizing ## priors: see help('prior_summary') ## observations: 463 ## predictors: 2 ## ## Estimates: ## Median MAD_SD 10% 50% 90% ## (Intercept) 4.0119 0.0246 3.9783 4.0119 4.0437 ## beauty 0.1337 0.0331 0.0900 0.1337 0.1741 ## sigma 0.5460 0.0168 0.5229 0.5460 0.5692 ## ## Monte Carlo diagnostics ## mcse khat n_eff ## (Intercept) 0.0009 0.1144 758 ## beauty 0.0012 0.1527 680 ## sigma 0.0006 0.1232 722 ## ## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and khat is the Pareto k diagnostic for importance sampling (perfomance is usually good when khat < 0.7). ``` ] .panel[.panel-name[Plotting the uncertainty] <img src="data:image/png;base64,#03-statistical-inference-simulation_files/figure-html/unnamed-chunk-15-1.png" width="60%" /> ] ] --- class: middle, center, hide-logo background-image: url(img/title_slide.png) background-size: cover # .acid[Thank You] # .glow[Questions?] --- class: middle ### Extra reading (all link to qub library ebooks) [Gelman, A; Hill, J; & Ati Vehtari (2020)., Regression and Other stories, Wiley Publishing.](https://www-cambridge-org.queens.ezp1.qub.ac.uk/highereducation/books/regression-and-other-stories/DD20DD6C9057118581076E54E40C372C#overview) [Cunningham, S. (2021). Causal inference: The mixtape. Yale University Press.](https://mixtape.scunning.com/) [Statistical rethinking : a Bayesian course with examples in R and Stan / Richard McElreath](https://encore.qub.ac.uk/iii/encore/record/C__Rb2089842__Sstatistical%20rethinking__Orightresult__U__X7?lang=eng&suite=def)